Search CORE

14 research outputs found

Latency Analysis of Coded Computation Schemes over Wireless Networks

Author: Pedarsani Ramtin
Reisizadeh Amirhossein
Publication venue
Publication date: 30/06/2017
Field of study

Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling data across many nodes in the network. Recently, it has been shown that codes can provide significant gains in overcoming these bottlenecks. In particular, optimal coding schemes for minimizing latency in distributed computation of linear functions and mitigating the effect of stragglers was proposed for a wired network, where the workers can simultaneously transmit messages to a master node without interference. In this paper, we focus on the problem of coded computation over a wireless master-worker setup with straggling workers, where only one worker can transmit the result of its local computation back to the master at a time. We consider 3 asymptotic regimes (determined by how the communication and computation times are scaled with the number of workers) and precisely characterize the total run-time of the distributed algorithm and optimum coding strategy in each regime. In particular, for the regime of practical interest where the computation and communication times of the distributed computing algorithm are comparable, we show that the total run-time approaches a simple lower bound that decouples computation and communication, and demonstrate that coded schemes are

\Theta(\log(n))

times faster than uncoded schemes

arXiv.org e-Print Archive

Crossref

EM for Mixture of Linear Regression with Clustered Data

Author: Gatmiry Khashayar
Ozdaglar Asuman
Reisizadeh Amirhossein
Publication venue
Publication date: 22/08/2023
Field of study

Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federated learning where a common latent variable governs the distribution of all the samples generated by a client. It is therefore natural to ask how the underlying clustered structures in distributed data can be exploited to improve learning schemes. In this paper, we tackle this question in the special case of estimating

d

-dimensional parameters of a two-component mixture of linear regressions problem where each of

m

nodes generates

n

samples with a shared latent variable. We employ the well-known Expectation-Maximization (EM) method to estimate the maximum likelihood parameters from

m

batches of dependent samples each containing

n

measurements. Discarding the clustered structure in the mixture model, EM is known to require

O(\log(mn/d))

iterations to reach the statistical accuracy of

O(\sqrt{d/(mn)})

. In contrast, we show that if initialized properly, EM on the structured data requires only

O(1)

iterations to reach the same statistical accuracy, as long as

m

grows up as

e^{o(n)}

. Our analysis establishes and combines novel asymptotic optimization and generalization guarantees for population and empirical EM with dependent samples, which may be of independent interest

arXiv.org e-Print Archive

Robust and Communication-Efficient Collaborative Learning

Author: Hassan Hamed
Mokhtari Aryan
Pedarsani Ramtin
Reisizadeh Amirhossein
Taheri Hossein
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively. It is well-known that decentralized optimization schemes face two major system bottlenecks: stragglers' delay and communication overhead. In this paper, we tackle these bottlenecks by proposing a novel decentralized and gradient-based optimization algorithm named as QuanTimed-DSGD. Our algorithm stands on two main ideas: (i) we impose a deadline on the local gradient computations of each node at each iteration of the algorithm, and (ii) the nodes exchange quantized versions of their local models. The first idea robustifies to straggling nodes and the second alleviates communication efficiency. The key technical contribution of our work is to prove that with non-vanishing noises for quantization and stochastic gradients, the proposed method exactly converges to the global optimal for convex loss functions, and finds a first-order stationary point in non-convex scenarios. Our numerical evaluations of the QuanTimed-DSGD on training benchmark datasets, MNIST and CIFAR-10, demonstrate speedups of up to 3x in run-time, compared to state-of-the-art decentralized optimization methods

arXiv.org e-Print Archive

eScholarship - University of California

Variance-reduced Clipping for Non-convex Optimization

Author: Das Subhro
Jadbabaie Ali
Li Haochuan
Reisizadeh Amirhossein
Publication venue
Publication date: 02/06/2023
Field of study

Gradient clipping is a standard training technique used in deep learning applications such as large-scale language modeling to mitigate exploding gradients. Recent experimental studies have demonstrated a fairly special behavior in the smoothness of the training objective along its trajectory when trained with gradient clipping. That is, the smoothness grows with the gradient norm. This is in clear contrast to the well-established assumption in folklore non-convex optimization, a.k.a.

L

--smoothness, where the smoothness is assumed to be bounded by a constant

L

globally. The recently introduced

(L_0,L_1)

--smoothness is a more relaxed notion that captures such behavior in non-convex optimization. In particular, it has been shown that under this relaxed smoothness assumption, SGD with clipping requires

O(\epsilon^{-4})

stochastic gradient computations to find an

\epsilon

--stationary solution. In this paper, we employ a variance reduction technique, namely SPIDER, and demonstrate that for a carefully designed learning rate, this complexity is improved to

O(\epsilon^{-3})

which is order-optimal. Our designed learning rate comprises the clipping technique to mitigate the growing smoothness. Moreover, when the objective function is the average of

n

components, we improve the existing

O(n\epsilon^{-2})

bound on the stochastic gradient complexity to

O(\sqrt{n} \epsilon^{-2} + n)

, which is order-optimal as well. In addition to being theoretically optimal, SPIDER with our designed parameters demonstrates comparable empirical performance against variance-reduced methods such as SVRG and SARAH in several vision tasks

arXiv.org e-Print Archive

Robust Federated Learning: The Case of Affine Distribution Shifts

Author: Farnia Farzan
Jadbabaie Ali
Pedarsani Ramtin
Reisizadeh Amirhossein
Publication venue
Publication date: 15/06/2020
Field of study

Federated learning is a distributed paradigm that aims at training models using samples distributed across multiple users in a network while keeping the samples on users' devices with the aim of efficiency and protecting users privacy. In such settings, the training data is often statistically heterogeneous and manifests various distribution shifts across users, which degrades the performance of the learnt model. The primary goal of this paper is to develop a robust federated learning algorithm that achieves satisfactory performance against distribution shifts in users' samples. To achieve this goal, we first consider a structured affine distribution shift in users' data that captures the device-dependent data heterogeneity in federated settings. This perturbation model is applicable to various federated learning problems such as image classification where the images undergo device-dependent imperfections, e.g. different intensity, contrast, and brightness. To address affine distribution shifts across users, we propose a Federated Learning framework Robust to Affine distribution shifts (FLRA) that is provably robust against affine Wasserstein shifts to the distribution of observed samples. To solve the FLRA's distributed minimax problem, we propose a fast and efficient optimization method and provide convergence guarantees via a gradient Descent Ascent (GDA) method. We further prove generalization error bounds for the learnt classifier to show proper generalization from empirical distribution of samples to the true underlying distribution. We perform several numerical experiments to empirically support FLRA. We show that an affine distribution shift indeed suffices to significantly decrease the performance of the learnt classifier in a new test user, and our proposed algorithm achieves a significant gain in comparison to standard federated learning and adversarial training methods

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Robust and Efficient Algorithms for Federated Learning and Distributed Computing

Author: Reisizadeh Amirhossein
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

Training a large-scale model over a massive data set is an extremely computation and storage intensive task, e.g. training ResNet with hundreds of millions of parameters over the data set ImageNet with millions of images. As a result, there has been significant interest in developing distributed learning strategies that speed up the training of learning models. Due to the growing computational power of the ecosystem of billions of mobile and computing devices, many future distributed learning systems operate based on storing data locally and pushing computation to the network edge. Unlike traditional centralized machine learning environments, however, machine learning at the edge is characterized by significant challenges including (1) scalability due to severe constraints on communication bandwidth and other resources including storage and energy, (2) robustness to stragglers, and edge failures due to slow edge nodes, (3) models generalizing to non-i.i.d. and heterogeneous data.In this thesis, we focus on two important distributed learning frameworks: Federated Learning and Distributed Computing, with a shared goal in mind: how to provably address the critical challenges in such paradigms using novel techniques from distributed optimization, statistical learning theory, probability theory, and communication and coding theory to advance the state-of-the-art in efficiency, resiliency, and scalability. In the first part of the thesis, we devise three methods to mitigate communication cost, straggler resiliency and robustness to heterogeneous data in federated learning paradigms. Our main ideas are to employ model compression, adaptive device participation and distributionally robust minimax optimization, respectively for such challenges. We characterize provable improvements for the proposed algorithms in terms of convergence speed, expected runtime, and generalization gaps.Moving on to the second part, we consider important instances of distributed computing frameworks such as distributed gradient aggregation, matrix-vector multiplication and MapReduce-type computing tasks and propose several algorithms to mitigate the aforementioned bottlenecks in these settings. The key idea in our designs is to introduce redundant and coded computation in an elaborate fashion in order to benefit in communication cost and the total runtime. We also support our theoretical results in both parts by significant improvements in numerical experiments

eScholarship - University of California

Recommended from our members

Robust Federated Learning: The Case of Affine Distribution Shifts

Author: Farnia Farzan
Jadbabaie Ali
Pedarsani Ramtin
Reisizadeh Amirhossein
Publication venue: eScholarship, University of California
Publication date: 12/12/2020
Field of study

eScholarship - University of California

Quantized Decentralized Consensus Optimization

Author: Hassani Hamed
Mokhtari Aryan
Pedarsani Ramtin
Reisizadeh Amirhossein
Publication venue: eScholarship, University of California
Publication date: 21/01/2019
Field of study

Crossref

eScholarship - University of California

Recommended from our members

An Exact Quantized Decentralized Gradient Descent Algorithm

Author: Hassani Hamed
Mokhtar Aryan
Pedarsani Ramtin
Reisizadeh Amirhossein
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

eScholarship - University of California